Probability OPTIMISTIC BAYESIAN SAMPLING IN CONTEXTUAL - BANDIT PROBLEMS

نویسندگان

  • Benedict C. May
  • Nathan Korda
  • Anthony Lee
  • David S. Leslie
چکیده

In every sequential decision problem in an unknown environment, the decision maker faces a dilemma over whether to explore to discover more about the environment, or to exploit current knowledge. We address the exploration/exploitation dilemma in a general setting encompassing both standard and contextualised bandit problems. In this article we extend an approach of Thompson [13] which makes use of samples from the posterior distributions for the instantaneous value of each action. We also extend the approach by introducing a new algorithm, Optimistic Bayesian Sampling (OBS) in which the probability of playing an action increases with the uncertainty in the estimate of the action value. This results in better directed exploratory behaviour. We prove that, under unrestrictive assumptions, both approaches result in optimal behaviour with respect to the average reward criterion of Yang and Zhu [15].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simulation Studies in Optimistic Bayesian Sampling in Contextual-Bandit Problems

This technical report accompanies the article “Optimistic Bayesian Sampling in Contextual-Bandit Problems” by B.C. May, N. Korda, A. Lee, and D.S. Leslie [3].

متن کامل

Optimistic Bayesian Sampling in Contextual-Bandit Problems

In sequential decision problems in an unknown environment, the decision maker often faces a dilemma over whether to explore to discover more about the environment, or to exploit current knowledge. We address the exploration-exploitation dilemma in a general setting encompassing both standard and contextualised bandit problems. The contextual bandit problem has recently resurfaced in attempts to...

متن کامل

Thompson Sampling for Contextual Bandits with Linear Payoffs

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the stateof-the-art methods. However, many questions regarding its theoretical performance remained open. In this paper, we d...

متن کامل

Taming Non-stationary Bandits: A Bayesian Approach

We consider the multi armed bandit problem in non-stationary environments. Based on the Bayesian method, we propose a variant of Thompson Sampling which can be used in both rested and restless bandit scenarios. Applying discounting to the parameters of prior distribution, we describe a way to systematically reduce the effect of past observations. Further, we derive the exact expression for the ...

متن کامل

A Practical Method for Solving Contextual Bandit Problems Using Decision Trees

Many efficient algorithms with strong theoretical guarantees have been proposed for the contextual multi-armed bandit problem. However, applying these algorithms in practice can be difficult because they require domain expertise to build appropriate features and to tune their parameters. We propose a new method for the contextual bandit problem that is simple, practical, and can be applied with...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011